Introduction

In this report I’ll go through details of mthodology and criteria used for determining DDR deficiency in cancer samples from HMF and PCAWG. I have also included preliminary results towards the end.

Overview

DDR pathways and genes that are studied come from the following paper: https://doi.org/10.1038/s43018-020-0050-6

Pleasance et al. determined deficiency of 12 DDR pathway examining somatic sequence of 181 genes involved in those pathways, whether as core or accessory genes. Some genes were involved in more than one pathway. Out of 181 genes, 4 genes whose entries were absent in grch37.ensembl were dropped (“BABAM2”, “FAAP100”, “FAAP20”, “FAAP24”). These genes were only involved as accessory components. Therefore, in this study we will use 177 genes in total. A table of genes and their pathway(s) can be found below:

N.B. Genes that are accessory in some pathways and core in others have more than one row.

For all these genes their biallelic status was determined by examining copy number information and simple mutations. Simple mutations were annotated using SnpEff and CLinVar databases. An overview of the scoring can be found below:

selection criteria at the gene level. Having the status of both alleles of each gene, we can decide whether we would like to consider a monoallelic hit as dysfunctionality criteria or a biallelic hit. If we consider a score of 3 or more for each allele to be dysfunctional, then will get 4986 genes that have biallelic dysfunctionality vs. 180,821 genes that have monoallelic dysfunctionality. If we consider a score of 4 or higher those number will be 3,006 and 176,116, respectively. Looking at the number of deficient samples in other similar studies, including the one we are using as reference, it makes more sense to use the biallelic hits to determine deficiency. Regarding the threshold score, as we are dealing with somatic events, I would continue with a score of 3 or higher.
## 
##         2x_mut_pathogenic             deep_deletion                  loh_only 
##                       109                       392                    170182 
## loh,mut_likely_pathogenic        loh,mut_pathogenic               loh,mut_vus 
##                       553                      1904                      1481 
##     mut_likely_pathogenic            mut_pathogenic                      none 
##                       523                       972                      4705
## 
##         2x_mut_pathogenic             deep_deletion loh,mut_likely_pathogenic 
##                       109                       392                       553 
##        loh,mut_pathogenic               loh,mut_vus     mut_likely_pathogenic 
##                      1904                      1481                        92 
##            mut_pathogenic                      none 
##                       180                       275

Moving on to selection criteria at the sample level. at this stage, I have only considered core genes (n=112) of each pathway. In every sample and for each pathway, we count the number of genes that are likely disfunctional (having a score of 3 or more in both alleles). A dysfunctional gene that is a core component of more than one pathway will be counted towards each pathway.
If a sample has a count of zero for all pathways (i.e. none of the core genes are determined as dysfunctional), the sample is annotated as “no deficiency”. If in a sample two pathways have the same number of dysfunctional core genes (no absolute maximum), the sample is annotated as “no consensus”. Finally, if one of the pathways has more dysfunctional core genes than others, the sample is annotated to be deficient for that pathway.

N.B. SSA pathway is co-deficient in 54 four cases but in none of them it has the absolute maximum. That’s because all 4 core genes in SSA are also core genes in other pathways and it’s unlikely for SSA to be selected.

Overall the number of samples are lower comparing to Pleasance et al. work. This could partly be explained by the fact that our cohort (n=7049) insludes both primary (n=2310) and metastatic (n=4739) cancers while their cohort only consists of metastatic tumors (n=570).

Compare TMB of DDR-deficient groups